333 research outputs found
Bounds on Query Convergence
The problem of finding an optimum using noisy evaluations of a smooth cost
function arises in many contexts, including economics, business, medicine,
experiment design, and foraging theory. We derive an asymptotic bound E[ (x_t -
x*)^2 ] >= O(1/sqrt(t)) on the rate of convergence of a sequence (x_0, x_1,
>...) generated by an unbiased feedback process observing noisy evaluations of
an unknown quadratic function maximised at x*. The bound is tight, as the proof
leads to a simple algorithm which meets it. We further establish a bound on the
total regret, E[ sum_{i=1..t} (x_i - x*)^2 ] >= O(sqrt(t)) These bounds may
impose practical limitations on an agent's performance, as O(eps^-4) queries
are made before the queries converge to x* with eps accuracy.Comment: 6 pages, 2 figure
Automatic Differentiation of Algorithms for Machine Learning
Automatic differentiation---the mechanical transformation of numeric computer
programs to calculate derivatives efficiently and accurately---dates to the
origin of the computer age. Reverse mode automatic differentiation both
antedates and generalizes the method of backwards propagation of errors used in
machine learning. Despite this, practitioners in a variety of fields, including
machine learning, have been little influenced by automatic differentiation, and
make scant use of available tools. Here we review the technique of automatic
differentiation, describe its two main modes, and explain how it can benefit
machine learning practitioners. To reach the widest possible audience our
treatment assumes only elementary differential calculus, and does not assume
any knowledge of linear algebra.Comment: 7 pages, 1 figur
Soft-LOST: EM on a Mixture of Oriented Lines
Robust clustering of data into overlapping linear subspaces
is a common problem. Here we consider one-dimensional subspaces that
cross the origin. This problem arises in blind source separation, where
the subspaces correspond directly to columns of a mixing matrix. We
present an algorithm that identifies these subspaces using an EM procedure,
where the E-step calculates posterior probabilities assigning data
points to lines and the M-step repositions the lines to match the points
assigned to them. This method, combined with a transformation into a
sparse domain and an L1-norm optimisation, constitutes a blind source
separation algorithm for the under-determined case
An Analysis of Publication Venues for Automatic Differentiation Research
We present the results of our analysis of publication venues for papers on
automatic differentiation (AD), covering academic journals and conference
proceedings. Our data are collected from the AD publications database
maintained by the autodiff.org community website. The database is purpose-built
for the AD field and is expanding via submissions by AD researchers. Therefore,
it provides a relatively noise-free list of publications relating to the field.
However, it does include noise in the form of variant spellings of journal and
conference names. We handle this by manually correcting and merging these
variants under the official names of corresponding venues. We also share the
raw data we get after these corrections.Comment: 6 pages, 3 figure
Simplifying Neural Network Soft Weight-sharing Measures by Soft Weight-measure Soft Weight Sharing
The abstract is included in the text
Algorithmic Differentiation, Functional Programming, and Iterate-to-Fixedpoint
Abstract included in text
Gradient Descent: Second-Order Momentum and Saturating Error
Batch gradient descent, ~w(t) = -7JdE/dw(t) , conver~es to a minimum
of quadratic form with a time constant no better than '4Amax/ Amin where
Amin and Amax are the minimum and maximum eigenvalues of the Hessian
matrix of E with respect to w. It was recently shown that adding a
momentum term ~w(t) = -7JdE/dw(t) + Q'~w(t - 1) improves this to
~ VAmax/ Amin, although only in the batch case. Here we show that secondorder
momentum, ~w(t) = -7JdE/dw(t) + Q'~w(t -1) + (3~w(t - 2), can
lower this no further. We then regard gradient descent with momentum
as a dynamic system and explore a non quadratic error surface, showing
that saturation of the error accounts for a variety of effects observed in
simulations and justifies some popular heuristics
Comments on 'Hebbian learning is jointly controlled by electrotonic and input structure
It is argued that simulations presented by Tsai, Camevale and Bmwn do not agree with their theoretical predictions and that their mathematical dedvation contains a major flaw. The origin of these misunderstandings is haced to the application of a special rase of an equation whose general version is given here
Soft-LOST: EM on a Mixture of Oriented Lines
Robust clustering of data into overlapping linear subspaces
is a common problem. Here we consider one-dimensional subspaces that
cross the origin. This problem arises in blind source separation, where
the subspaces correspond directly to columns of a mixing matrix. We
present an algorithm that identifies these subspaces using an EM procedure,
where the E-step calculates posterior probabilities assigning data
points to lines and the M-step repositions the lines to match the points
assigned to them. This method, combined with a transformation into a
sparse domain and an L1-norm optimisation, constitutes a blind source
separation algorithm for the under-determined case
AD in Fortran, Part 2: Implementation via Prepreprocessor
We describe an implementation of the Farfel Fortran AD extensions. These
extensions integrate forward and reverse AD directly into the programming
model, with attendant benefits to flexibility, modularity, and ease of use. The
implementation we describe is a "prepreprocessor" that generates input to
existing Fortran-based AD tools. In essence, blocks of code which are targeted
for AD by Farfel constructs are put into subprograms which capture their
lexical variable context, and these are closure-converted into top-level
subprograms and specialized to eliminate EXTERNAL arguments, rendering them
amenable to existing AD preprocessors, which are then invoked, possibly
repeatedly if the AD is nested
- …